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Abstract. In this paper we develop the theory of information geometry for single random ma- 
trix models, with two goals: proving a Cramer-Rao theorem for estimators on random matrices, 
and calculating the Legendre transform of pressure and entropy with respect to a metric duality. 
Consequently, in the large n limit we recover several quantities from free probability: Voiculescu's 
conjugate variable is the tangent vector to the CUE perturbation model, giving rise to a metric which 
turns out to be the free Fisher information measure; Hiai's Legendre transform of free pressure agrees 
with our Legendre transform of pressure; and Speicher's covariance of fluctuations naturally arises 
as the metric on the random matrix model obtained from the fluctuation functions. 



0.1. Introduction. 

Inspired by the work of |ANOO| . we treat random n x n matrix models of the form 
exp (— nTr (^(A) +ip{n))) with p G M{x) and ip{n) = ^ log / exp {—nTr {p{A))) dA as statistical 
models and construct their information geometry. This achieves two goals: it proves the Cramer- 
Rao theorem, which is a Cauchy-Schwartz inequality on polynomial functions of the random matrix 
(Section 13.21) : and it calculates the entropy as the Legendre transform of pressure (Section lL6p . 

In Section [2] we relate our construction to free probability by considering the limit as the matrix 
size n approaches infinity. We show that the information geometric quantities converge. The 
pressure, entropy, and Legendre transform converge to the free pressure, free entropy, and free 
Legendre transform |Hia05| respectively. We also show that the information geometry of a Gaussian 
perturbation model converges to the free Fisher information measure |Voi93| . Finally we note the 
relation to the Free Cramer-Rao Theorem |Voi98| and the fiuctuations of random matrices |MS06| . 

Following is a quick review of classical information geometry, meant as a motivation for our 
development. The familiar reader may skip along to Section [H 
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0.2. Review of Classical Information Geometry. 

Classical information geometry may be viewed as the standard framework for doing convex analy- 
sis (finding minima/maxima) on real- valued functions of random variables. Given a random variable 
Xg whose distribution function belongs to a parametric model {qg{x)dx\ 9 E Q C M™}, and functions 
(estimators) Ci, • • • , ^ C'(lR)i one is interested in measuring the sensitivity of C,i{Xg), . . . , ^^{Xg) 
to changes in 9. This analysis is done following the presentation of |ANOO| using the methods of 
differential geometry, and the resulting theorem is a lower bound on the covariance of the deviations 
of the ^i's as follows. 
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A statistical model is a family of probability distributions on M parameterized by finitely many 
real parameters, S = \^q0{x)dx \9 G 6, qe{x) > 0, / qe{x)dx = 1} with O C M™" open. 

An exponential family is a statistical model with qe{x) = exp {p{x) + where 
il){9) = — log/ exp {p{x) + '^Oifi{x)) dx and p, fi, . . . , fm E C{R) such that 'tp{9) converges. We 
denote = p{x) + Yl Gifi{x) + ip{6). 

An exponential family S" is a manifold under the map exp {po^x)) ^6. Its tangent space is the 
vector space of random variables 



d 

TgS = span <J — exp {pe{x)) (Xg) 



Xg ~ exp (peix)) J> ~ span | -^PeiXg 

i=l 



Xg ~ exp {pg{x)) 



i=l 



d9. 

There is a natural L^-structure on this space which allows us to define an inner-product 

if, 9)g = j f{x)g{x) exp {pe{x)) dx, f,g E TgS, 
and this gives the Fisher Information Metric 

The L^-structure also identifies the potential pg as — d*g(l){y)dy , where 

dg : L^(exp {pe{,x))) -> L^(exp f y-^ f" 

is the unbounded differentiation operator. This follows from the calculation 

{dl{l){x), /i(a;)>i2(exp(p,)) = / ^ ■ {Pe{x)) dx = 

h{x)p'g{x) exp {pg{x)) dx = {h{x), -Pe{x)) L2i^eMpe{x))dx) > 

for any polynomial h. Thus, the tangent space consists of partial derivatives of — j^^d*g{l){y)dy 
with respect to 6*,. 

In this framework |ANOO| prove the Cramer-Rao theorem: 

Theorem 0.1. Let S be an exponential family with Fisher information metric g, and let 
^i, . . . ,Cm '■ M — > K"^ be unbiased estimators i.e. J ^i{x) exp {pg{x)) dx = 9i. Then 

UXg)-e,,i,{Xg)-e,)>gr^{e) 

in the sense of positive semi-definite matrices. 

Next, to find minima/maxima on S and calculate the Legendre transform of iIj, |AN00| specify 
a second derivative (the tangent space being the first derivative). This is done by fixing an affine 
connection (see |BG80| ). which is given in coordinates by 

where a is a parameter for the amount of curvature. For example, an exponential family is flat for 
a = 1, and a mixture family (gg = /i(x+ey)/{i+6») with X and Y independent) is flat for a = —1. 
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These connections are distinguished by the duality of the (a)— and (—a)— connections with 
respect to g: 

Proposition 0.2. = rg(^) + T^-f{d). 

This allows |ANOO| to prove 

Theorem 0.3. Let S he a manifold with metric g, a pair of dual affine connections T, T*, and a 
smooth function / : M R. If 6' E M satisfies -^f{XQi) = and ^|^/(Xe/) > g in the sense of 

positive semi-definite matrices, where is the covariant derivative, then 39 a small neighborhood 
of 9' such that f{Xf)>) = sup^gg, f{Xg). 



1. Basic Notions 

1.1. Manifold. 

We start with a random n x n self-adjoint matrix A with complex entries, distributed according 

to 

(1.1) exp {-n Tr {p{A) + ^(n))) dA on M^^{C) 

where p G M(x) is convex, and ipi^n) = ^ log/^^sA(c-) exp (— nTr (p(A))) c/A is the normalization 
to a probability measure. The quantity ip is also known as the pressure |Hia05| . In this paper 
Tr : M;f^(C) ^ C hy A ^ ^ti ^u, and dA = ni<.<,<n ^^e(A i)ciJm(A,,) Ui<i<nd^ii- We recall 
a useful fact which guarantees convergence of this model |Bia03| : 

Lemma 1.1. Given p G C^(]R) convex, 3!g : M — > M Borel, q{x) > 0, J^q{x)dx = 1 such that 
V/ e C{R), 

Ti {f (A)) exp {-n Ti {p{A) +ip{n)))dA^ j f{x)q{x)dx 

where q is defined by the integral equation 

/qiv) 
dy = p'ix), Vx G suppiq). 
y-x 

Definition 1.2. An exponential family is a family of distributions on M^'^(C) of the form 



1=1 



^ = I exp I -nTr ( p{A) + ^ e,F,{A) + ^{6, 

where 

ip{6, n 



^ log j exp l^-nTr (^p{A) + 0^F'^{A)j j dA 



is the normalization constant, Fi, . . . ,Fm G C(]R) are the perturbation functions, and C M'" is 
an open set of parameters (chosen so that the integral in the definition of ip converges). 

Notation 1.3. Write pe,n(^) = p{A) + YZl^^HA) + ^(^,ri), pe{A) = p{A) + YZi^iHA), and 
djJie,n{A) = exp (-nTr (pe,n(^))) dA. 
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diJie,n{A) e. 
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1.2. Tangent Space. 

In analogy to the classical case, we define the tangent space by identifying the potential of the 
limit distribution qg. Since qe is the distribution of a noncommutative random variable, the free 
difference quotient plays the role of the derivative: 

de : L2(R, qg) ^ L\R, qe) ® ^^(R, g,), f\x) ^ ^^""^ ~ ^^^^ 

x-y 

(this operator is explained in |Voi93| ). It is a densely-defined derivation with domain (Sg) = 
polynomials. For g G domain (dg) we have 

{d; (1 ® 1) = (1 ® 1, (deg) ix,y)) ^^^^^^^^^^.^^^^^^ = 

9{x) ^^y\qg(^x)qe{y)dxdy = -2 [ g{x) ( [ ^^^^dy] qg{x)dx = 



re- 



X — y J \J y — X 

(p;(x),c/(x))^2(jj,^^) . 

Therefore, the limit potential is identified as dg{l ® 1) = J^^pg{y)dy = pg{x). 
Definition 1.5. The potential of d^g^n{A.) is pg^n{A.). 

f ■\ m 

Proposition 1.6. The tangent space is given by TgS = spani ^^-^^^^(A) A ~ dfj.g^n{A.) > , 
garded as a vector space of random variables. 

Proof. Fix 6 E Q. Each -^pg,n, • • • , g£;;-Pe,n defines a curve through 9: 

7,(t) = exp (^-nTr (^pe,„(A) + t-^Pd,n{^) + ^(^,«))) • 

Conversely, suppose h G R(x) is convex and exp {~n Ti {h{A) + (j){n))) G S, with 0(n) = 
^log/ exp(-nTr(/i(v4)))dA. Get 6' such that exp(-n Tr(/i(A) + (j){n)))dA = dfig'A^). Then 
by Lemma [TTl exp(— n Tr(/i(A) + 0(n)) and d^g'^n both converge to qg' satisfying 



h'(x) = 2pr.v. f '^—^^dy = Pg,(x). 
J x-y 



Therefore, h and pgi differ only by an additive constant, which may be absorbed into the normal- 
ization 0(n). Thus h{A) = p{A) + Yl^i (^i'^Pd,ni^)} and we conclude that all curves in S throught 
6 are given by a linear combination of z = 1, . . . , m. □ 



1.3. The Fisher Information Metric. 

A key feature of the metric defined by |ANOO| is that it satisfies the equation 

(1.2) 9,m = g^m. 
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We take this equation as the starting point for our definitions, and we calculate 
(1.3) 

a^^^''"^ = U /exp(-nTr(,(A) + Er..^.^.(^)))^^ ) = 'n J (^.) ^/^..(^), 

and 

(1.4) aBf/O-"^ = (^.) + «))*..(^). 

We define an inner-product and a corresponding metric and check that it satisfies (ll.4p . 
Definition 1.7. 

(/(A),<?(A)),^„= /■ Tr(/(A))Tr((7(A))rf/x,,„(A) 



(1.5) gij{e,n) = (^■^Pe,n, -^Ve,^ ^ j ( (^^''"^^0 

Proposition 1.8. gij{9,n) = gg.gg. ^iO, n). 
Proof. We calculate: 

= jTi(^Fi-^J Ti{Fi)d^e,n^)^ Tr (^F, -^j Tr(F,)d/xe,n(^)) c?/ie,n(^), 
by equation I \1M . 

Now Tr / Tv{Fi)dfig^n{A)) = J Tv{Fi)dfie,n{A), so we have 

gij{9,n) = j Tr(F,)Tr(F,)rf/ie,„(A)- j Tr (F,) rf/i,,.(A) ^ Tr (F,) d/ie,„(A) 



Tr (F,) Tr (^F, + ^^(e, n) ) d/ie,n(A) 



using equations (11.31) and (II. 4p . □ 



Remark 1.9. In connection to Voiculescu's free probability theory, it seems natural to define the 
inner-product on the tangent space as 

(1.6) U,g)e = - I TAf{A)g{A))d^ieAA). 

n J 

However, the metric (|1.6I) does not satisfy equation (11.21) . which is crucial in order to calculate the 
Legendre transform in Section [L6l 

Also, if we restrict our attention to Gaussian Unitary Ensemble perturbations (which give rise 
to a semicircular perturbation as n ^ oo), the metric ( II. 5p coincides with metric ( II. 6p (see Section 
EJ). 
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1.4. The (Q;)-Connections. 

To calculate the Legendre transform we need a pair of dual afiine connections on the manifold 
BG80| ). In fact we define a family of pairs of dual connections with curvature parameter 



see 



a G [—1,1], and we denote the connection coefficients by T-"^'" (6*). To compute the Legendre 



transform, the (a)-connection must be dual to the (— a)-connection, i.e. 



We take this as our starting point for the definition of F, and we calculate: 

d 



This leads us to the following definition 
Definition 1.10. 



n y Tr (^-^ps^A)^ Tr (^-^pe^A)^ Tr dfxg^A). 



2 

Notice that the connection coefficients depend on the choice of coordinate system, and this gives 
the notion of flatness: 

Definition 1.11. A coordinate system {Q} is (a)-flat if F^^^^'" (C) = 0. That is, the vector fields 
Xi = ^ are parallel with respect to the (a)-connection. 

Proposition 1.12. An exponential family S is {l)-flat. 
Proof. Calculate: 

= -n^|^V^(^,n)- Tr(F,(A))d/i,,„(A)- J Tr d/ie,„(A)^ =0. 

□ 

Proposition 1.13. The (a)- and {—a) -connections are mutually dual with respect to g{6,n). 



INFORMATION GEOMETRY OF RANDOM MATRIX MODELS 7 
Proof. This follows from the calculation: 

W,''^ (^,^) =/Tr {qIqo-PoAA?! Tr d^^eAA) 

- ^ ■ n ■ / Xr Tr Tr ( 

+ 4^ ■ n ■ / Tr Tr ( Tr ( J^P.n(A)) 

□ 



Corollary 1.14. -^gij{9,n) = ^ij^iO) +'rikj{0), i.e. the {0)- connection is the Levi-Cevita 
(metric) connection. 



1.5. Several Independent Matrices. 

Our discussion started with a single random matrix model to familiarize the reader with the 
geometric notions and calculations, but in fact it extends to several independent matrices as follows. 
Start with independent random matrices A,, . . . ,Ak with Ar distributed according to 

exp ^-nTr (r) (A) + '^^^ '^^^ '^^^ + ^ '^^^ ^'^^ ' ^) j j 

Then {Ai, . . . , A,) is distributed according to 

exp (^-nTv(Jj2p{r){Ar)j + (j2J2^^ir)F,{r){Ar)j + (j^^j (r) {9 (r) ,n)^yj dA, . . . dA,. 

Notation 1.15. Write P (Ai, Afc) for EJ=iPW (^), ^ for 

(^1 {!),..., a^{l),ei (2),..., Om (2), ,em {k)), and Pe,n {A,, . . . , A,) for 

P{A,,...,A,) + (Eti ET=i 0^ (r) P. (r) (A,)) + M/ (e, n), 

with ^ = ^ log / exp (-nTr (p (Ai, . . . , A^) + (^^i ET=i (^) (^) (^r))) ) • • • c^^fc- 

Also write diie{r),n{.^r) for exp (-nTr (p (r) (A^) + YT=i i^) i^) i^r))) dAr, and rf/ie • • • , A^) 
for c?/ie(i)^„(Ai) ■ ■ ■(iyUe(r),„(Afc). 



Notice, by recentering and rescaling we may assume that ^ / Tr (Ar) dfig(^r),n{^r) = and 
^/ Tv {A"^) di^g(^r),n{^r) = 1 for r = 1, . . . , fc. Since Ai,...,Ak are independent, |VDN921 Theo- 
rem 4.4.1] shows that {Ai,...,Ak) converges as n — oo in the sense that 3r a tracial state on 
C (xi, . . . , Xk) such that Vw G C (xi, . . . , x,) 

- I Tr (w (Ai, . . . , Ak)) dfLe,n {A,, . . . , A,) ^ t (w) 
n J 
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Now we must identify the potential. In the classical case, when one considers several indepen- 
dent random variables Xi,...,Xk instead of a single variable, with Xi distributed according to 
exp {pi{x)) dx, the potential is determined by the equation 

for r = 1, . . . , /c, with 

dx^ '■ ^exp ^^^Pr(a^)^ dxi. . .dx^^ ^exp ^^^Pr(a;)^ dxi. . .dx^^ , 

g{xi, ...,Xk)^ -T—g{xi, . . . , Xfc) 

the densely defined partial differentiation operator with domain((ia;^) = polynomials. 

In the multi-matrix case, the equation for hg to be a potential in the limit n — > cxd becomes 

with and V^^ defined in |Voi98| as follows, d^^ : (C (xi, . . . , Xk) , t) — (C (xi, . . . , Xk) , t) ® 

(C (xi, . . . ,r) is defined by d^X^s) = ■ 1 ® 1 and d^X^) = 0. V^,. : L'^{C{xi, . . .,Xk), r) 
L^(C(xi, . . . , Xk), t) is defined by V^^ = a o d^^ where a{x ®y)= yx. 

|Voi98l Prop 3.6] shows that d*^ (1 1) G {C{xr),T)^ so we may apply the discussion from 
Section [L2] to get 

7 / rn 

a: ® 1) = — U (r) (x,,) + (r) (r) 



i=l 



Thus, the condition for a potential of several independent matrices becomes 

_d_ 



Vx,h (xi, . . . , Xfc) = I p (r) (x^) + ^ 61, (r) (r) (x,/ 

1=1 



This has a solution 

/i (Xi, . . . , Xk) = Pe,n {xi-, • • • , Xk) , 

which leads to the definition 

Definition 1.16. The potential of djle^n (^i, • • • , ^fc) is Pe,n (^i, • • • , ^k)- 

Definition 1.17. The tangent space to this model is TqS = span |^Pe,n (^i, • • • , ^fc)| 

Next we define an inner- product and a corresponding metric that satisfies equation (11.21) . 
Definition 1.18. 

(/, h)g = -J Tr (/ (Ai, . . . , Ak)) Tv{h{A,,..., Ak)) djlg,^ (^i, • • • , ^fc) • 
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Definition 1.19. 

G^(r)Jis) {9, n) = Jtv (^-^^Ps^,}j Tr (^^Pe,„^ dfxe,n (Ai, . . . , A) . 

It is a straightforward calculation that 
Proposition 1.20. Gi^r)j(s) {0,n) = gg^^p^ei^'^ {0,n). 



Remark 1.21. Notice that for r 7^ s Ar is independent of Ag, so Gi(r)j{s) = 0. Therefore, if 

Ai, . . . ,Ak are identically distributed we have Pi = P2 = ■ ■ ■ = Pk and 6 (1) = 6 (2) = . . . = 6 (k), so 

* {e, n) = ^ log j exp l^-nTr (^pr (^) + ^ 0, (r) (A,) j j j rfAi . . . dA^ 

= ^ J] log y exp ||-nTr (A,) + ^ l'^) (^r) j j c^^r- 
Thus, as in the classical case 

g{e{l),n) 



gm),n) 



G{e,n) = 

We also define the (Q;)-connections 
Definition 1.22. 

Using the argument in RemarkOH for identically distributed Ai, . . . , we have ^['^^)^(^s)k(t) 

if r ^ s ^ t, or r ^ t, and rg)';;,),(,) (e) = rg'"(^(l)) otherwise. 

Definition 1.23. Given independent random matrices Ai, . . . ,Ak with distributions of the form 
(ll.ip . the Information Manifold associated to {Ai, . . . , A^) is the geometric structure S = (M, g, T, T*) 
described in this section. 

Combining our observations, we obtain the following theorem, 

Theorem 1.24. Let Ai, . . . ,Ak be independent random matrices vjith^ distribution functions of the 
form ( fj.j|j . let Si be the information manifold associated to Ai, and let S be the information manifold 
associated to {Ai, . . . , A^), then S = Si (B ■ ■ ■ (B Sk- 
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1.6. Legendre Transform of Pressure. 

In this section we calculate the Legendre transform of the pressure ip {0,n). Since the notation 
for several independent matrices is cumbersome, we calculate with a single matrix and by theorem 
( 11.2411 our calculations extend to several independent matrices. 



ANOOl Section 3.3-3.5] define the Legendre transform for a smooth real-valued function on a 



Reimannian manifold with a pair of dual connections; this includes our construction, so their dis- 
cussion applies in our case. 

Following [ANOOl Section 3.3-3.5], define a new coordinate system 

Vi = - [ Ti{F,{A))dfie,n{A). 
n J 

According to equation (11.30 . 



so we have 



and similarly 

de 



Therefore, {6i} and {rji} are coordinate systems which are mutually dual with respect to gij (9,n) 
ANOOl Section 3.5]. 



Proposition 1.25. {rji} is {—l)-flat. 

Proof. The {9i} coordinate system is (l)-fiat. By proposition (|1.13l) . the dual coordinate system is 
(-l)-fiat, so {r]i} is (-l)-fiat □ 



Definition 1.26. The Legendre Transform of ip{6,n) is 
^ {d, n) = sup I ^ y Tr [p{A) + ^ d[F-, {A) + {d' , n) j rf/ie,„(A) - ^ ^ Tr {p{A)) dfieA^) 

Proposition 1.27. 

ip{e,n) = - [ Tripe,niA))dfie.n{A)-- [ Tr(p(A)) rf/ie,„(A). 
n J ' n J 

Proof. [ANOOl Section 3.5] show that 

if{d,n) = y^e,r],{e) + ij{d,n) = - [ Ti ipe,n{A)) dfig^„{A) - - [ Tr {p{A)) dfig^„iA). 
~l n J n J 

□ 



Following the discussion in [ANOOl Section 3.5], the first summand in proposition p. 270 is the 
analogue of entropy, 
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Definition 1.28. 

H{dfie,n) = -- I Tt {p9,n{^))d'^J'e,n{^)■ 
n J 

Also according to the discussion in [ANOOl Section 3.5], 
Corollary 1.29. Both H{diie,n) and ip{9,n) are convex in 9. 



1.7. Calculations on Weil-Known Models. 

In this section we present some calculations on random matrix models that appear in applications. 

1.7.1. Gaussian Unitary Ensemble (GUE). 

Classically the most ubiquitous statistical model is the Gaussian family 

-p(-^-iM2..^)). 

which is an exponential family by setting 9i = —-^,92 = ^2) = ■^i^ + i log writing 

exp (-^^^^ - I log(27ra2)^ = exp (- (^^x^ + 9rx + ^{9i, 92))) . 

The corresponding random matrix model is the Gaussian Unitary Ensemble (GUE), with distri- 
bution function 

exp (^-^Tr|^^^ + -log^ 
which is an exponential family under the same coordinates 6'i = — 6*2 = 2~2, and tp{9i,92,n) 



We see that 



o2 

.^{9r^92) = -^^{9uhn), 



d9M,^ ' d9Mj 



so the Fisher information metric of the GUE model in ^6'i, 6^2 j -coordinates is the same as the 
classical Fisher information metric of the Gaussian model in (^1, 6'2)-coordinates. Using the change of 
coordinate rule gki{C) = J2Z=i9ijip) (l^) (if)' the equality ^ = ^, and the equality = Ir 
for i = 1,2, we see that the Fisher information metric for the GUE model in the (/x, (j)-coordinates 
is the same as the Fisher information metric for the Gaussian model in the (yU, cr)-coordinates: 

gij{fi,a,n) = [ 2_ 

Taking the limit n — > 00, the semicircle and Gaussian distributions have the same Fisher infor- 
mation metric. 
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1.7.2. Laguerre Unitary Ensemble (LUE). 

Another well-known model in random matrix theory is the Laguerre Unitary Ensemble (LUE) 
also known as a Wishart matrix. This is defined in |FW02| as the random matrix A = X*X with 
X ~ exp (— nTr (X^)) (we take n = N in the definition of |FW02| so their notation matches 
ours) . |FW02| show that the eigenvalues of A are distributed according to 

n 

■ n ~ n ^^^'> ■ xx,>odXk, 

l<i<j<n k=l 

with Zn the normalization constant. Therefore, A is distributed according to Z^^-exp (— nTr (A)) dA 
on { A G M;f^(C)| A > 0} as an orthogonally invariant model. We will not discuss orthogonally- 
invariant models in general, but our definitions make sense verbatim in this case. 

Instead of starting with a standard LUE, we may parameterize its variance, and rescale it for 
convergence as n — > 00 to obtain 

Z-\t) ■ exp (^-nTr (^^^^ on { A G M„^^(C)| A > O} 

with 

Z„W= / exp(-„Tr(^))dA = ^ exp(-„Tr(^))dA=ilog^. 

{A&MSA{<c)\a>Q} 

This is an exponential family by setting 61 = ip{6i,n) = | log (7r/n^^i), and writing 

Z-\t) - exp (^-tiTt (^J^^ = exp(-nTr(^iA + ^(^i,n))). 
Now Q^g^tp{9i,n) = ^^Jjy, so the Fisher information metric of the LUE model is 

gnit) = i^. 



2. The n ^ oo case 

2.1. Convergence. 

In this section we verify that the tangent space. Fisher information metric, (a)-connections, 
pressure, and entropy converge as n ^ 00. In fact we find that the entropy and pressure converge to 
the free entropy and free pressure, and the Fisher information metric of the semicircular perturbation 
model coincides with Voiculescu's Fisher information measure. 

First we note that the tangent space, regarded as a vector space of random variables, converges 
in moments: 

Proposition 2.1. 

Proof. By Lemma (|l.llj . □ 



Proposition 2.2. The metric gij{9,n) defined by equation 01.51) converges. 
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Proof. By definition gij{6,n) = Q^gg/ ipiO, n). |EM03| show that since peiA) is a polynomial, the 
potential pe^A) + \A? ^ has the expansion 

log j exp (^-nTr [pg{A) + ^A^^^ dA = n\^{e) + ei{e) + ^e^^O) + ... 
with ej an analytic function of 9 for z = 0, 1, . . .. Notice that 

^ ^ log [ exp f-riTr ( peiA) + ^A^^^ = ^ ^(O^n), 



because ^ (-nTr {\A^)) = 0. 
Therefore, 



(9^ 1 92 1 92 



dOidOj^ ' ' dOidOj ''^ ' 86,86 j ' 86,86 j 
Thus, we have 

n^oo n^oo 86,86 j 86,86 j 



□ 



Proposition 2.3. For a G [—1,1], i/ie {a) -connections converge. 
Proof. We have 

(2.1) / TV n)) Tr (f, + n)) ^ 

n^|^V^(^,n)- i:i{F,{A))diieAA)- j Tr (F, (A)) =0 
where the second equality is due to equation (|1.3lj . 
By equation (I2.ip . 

r!;i'" = - /Tr Tr Tr d^eAA). 

Writing out -^gij (6*, n) and using equation ( 12.ip again, we see that 



r^")'" - - . —a- (6 



1 — a 8 



n 



The argument in Proposition 12.21 using |EMn3| shows that 

8 8^ 8^ 



Therefore, 



lim r(")'" - ^ ~ " r (6\ 



□ 
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Proposition 2.4. The dual coordinate system converges. 
Proof. According to Lemma f ll.ip . 

lim r/j = lim - / Ti {Fi{A)) diie ,n{.A) = [ Fi{x)qe{x)dx, 

n— ►oo ji^oo n J J 



Theorem 2.5. The Legendre transform of ip converges. 

Proof. According to the discussion in section 11.61 the Legendre trasnform of ip {9, n) is 

^{9, n) = - / Tr {pe,n {A)) d^ie,n{A) - - / Tr (p {A)) d^ieA^) 
n J n J 

= Tr U/ie,„(A)+^(e,r2) 



By Lemma (11.11) . 

1 

n 

and 



(m \ „ / m 

Y,m{A)\d^,eAA)^ J n2 



9iFi{x) qe{x)dx, 



n 



^-\og j exp (^nTr ^p(A) + ^ j j dA 

j j log \x - y\ qQ{x)qe{y)dxdy - j ^{x) + ^ 9iFi{x)^ 



qe{x)dx. 



^ 9iFi{x) j qe{x)dx + / / log |x - y\ qe{x)qg{y)dxdy - / I p{x) + ^ 9iFi\ 



□ 



Therefore, lim„^oo </'(6', Ti) 

« « „ / m 

(x) ) q0{x)dx 

log \x - y\qg{x)qe{y)dxdy - I p{x)qg{x)dx. 



□ 



Let denote the limit of and let xile) denote the Free Entropy of q^. Since xile) 

//log \x — y\qg{x)qg{y)dxdy, the above calculation shows that 



Corollary 2.6. 



Combining Corollary 12.61 with Corollary II . 291 gives a new kind of convexity for free entropy of the 
limit of a random matrix, which is "dual" to the convexity under addition of the random variable: 

Corollary 2.7. Suppose dixe,n converges to qg for 9 E Q open. Then x ile) is convex in 9. 
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Remark 2.8. |Hia05| defines the free pressure for _R > and h E C ([— -R, R]) as 



= sup I - y 



h{x)dji{x) + X (/^) 



^eM{[-R,R])\, 



where M.{[—R, i?]) is the set of Borel probability measures supported on [— -R, R\. He calculates the 
free entropy as the Legendre transform of free pressure with respect to a Banach space duality, 



(2.2) 



x(/i) = inf<^ / h{x)dfi{x) + nji{h) 



h eC{[-R, R]) 



Now given a random matrix A ~ exp (—nTr {p{A) + ip{n))) with p G M.{x) convex, it converges 
to a measure /i. Fix R so that supp (/x) C [—R, R], and fix a collection Fi, . . . ,Fk G M(x). Consider 
the random matrix model 



exp -nTr p{A) + ^ e,F,{A) + ^{6, 



n] 



i=l 



By definition of ip and tir, 



iP{9,n)^nRlj20iFi{A) 



and by Corollary [2l 



We showed in section [L6l that 

(/9(0,n; 



inf |i j Tr [f^mim d^^,^n{A) + ^{e\ 



n] 



where the Legendre transform comes from a duality with respect to gij{0). In the limit n — > cxd this 
is the equation 

h G span {Fj} 



x(/i) = inf I y h{x)dij,{x) + tcr (//) 
which is a restriction of (12. 2p to h E span {Fi}. 



Therefore, the restriction of Hiai's Banach space duality to any finite linear span agrees with the 
corresponding Fisher information metric duality. 



Remark 2.9. Given a noncommutative random variable X G {A, r) satisfying 5^(1 (S> 1) G M(x) and 
x(^) < oo, we can uniquely define its information geometry (up to a constant) as follows. 

Suppose p G M(x) convex such that the random matrix model exp (— n Tr (p(yl) + ip^n))) con- 
verges to X in the sense that ^ J Tr(/(A)) exp (— nTr (j9(A) + ip{n))) dA — > r(/(X)) for all / G 
C(M). Fix perturbation functions Fi, . . . , G M(x). 

Definition 2.10. The information geometry of X relative to Fi, . . . , F^ is the limit of the infor- 
mation geometry of exp (—nTr {p{A) + YlJLi ^i^i + ^))) at 6* = 0. 
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To show uniqueness, suppose we have q G R(x) convex such that the random matrix model 
exp (— nTr {q{A) + 4>{n))) also converges to X. By Lemma [HI] = / "^^J^^y^ = q'{x), so q{x) = 
p{x) + c. The constant c may be absorbed into = + c. Since the tangent vectors, metric, and 
connections only depend on ^(f) = ^t/) and -q§^4> = a^W"^' ^^'^ information geometries are the 
same at 6* = 0. 

To show existence, since x{X) = - // log j^\d^x{x)d^ix{y) < oo, p{x) = J^^ log j^\d^x{x)d^ix{y) 
is well-defined and continuous. Since p"{x) = 2 J 'j'^^^p > 0, p is convex. By Lemma p.ip the ran- 
dom matrix model exp (—nTr {p{A) + ipiji))) converges to X, and p'{x) G ]R(a;). Thus, p G M(x) is 
convex and its random matrix model converges to X. 



2.2. Conjugate Variable and Free Fisher Information Measure. 

One of the motivations for this paper was to understand Voiculescu's conjugate variable. In this 
section we show that given a random matrix model exp (— nTr (p(y4) +?/'(n))), converging to an 
operator on a Hilbert space X, we can construct a random matrix model for X with the tangent 
vector converging in moments to the conjugate variable d*x (1 ® 1), and the Fisher information met- 
ric converging to Voiculescu's Fisher information measure $(V). Then we note that an analogous 
result holds for freely independent Vi, . . . , V^ using |Voi98l Prop 3.6]. 

The calculation in Section 11.21 shows that 

J x-y 

which suggests the random matrix model 

I exp (— n Tr {p{A) + rp\A) + ?/;(r, n))) } 

for the conjugate variable, with ?/'(r, n) = ^ log / exp {—nTr {p{A) + rp'{A))) dA. 

The tangent vector to this model at r = is p'{A) + ^|^^q^(?", n), and the Fisher information 
metric at r = is 



^?ii(0) 



^(r,n) ) (i/io,n(^)- 

r=0 



To evaluate these expressions we need a few formulae. 
Proposition 2.11. For h G M(x), 



Tr {h' {A)) d/io,„(A) = n Ti{h {A)) Tr {p' {A)) rf^o,n(A). 
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k 



Proof. For a monomial, h{A) = A , we have 

j Tiih' {A))dfioAA) = j Tr(A;A'=-i)rf/io,„(A) 



«i,.--,*fc-i 

k 



j=l ii,...,ik 

ij=ij+l 



x=l a;=l ^ ^ 

with the last equality due to integration by parts. 

Applying this calculation in reverse to ^j— Tr(po,n), we see that ^j— Tr(po,n) = Tr(pQ^(A)). So 
we arrive at 



Ti{h'{A))dfioAA)=^ j Tr(/i(A))Tr(p[,_JA))t//io,„(A). 
By linearity of Tr we have the result for any h G M(x). □ 

Now we can evaluate using Proposition 12.111 

d 



dr 



^(r, n) = -- f Tr {p'{A)) dfioA^) = " A / Tr (1) Tr {p'{A)) dfio.^A) 
n J ri'^ J 



r=Q 

1 



Tr (0)rf/io,n(A) = 0. 
So the tangent vector at r = is in fact p'{A). By Lemma (II. ip . as n ^ oo 

i f Tr((p'(A))'=)d/io,n(A)^ /" {p'{x))'dfix{x)= I {d*^{l®l){x)tdiix{x) 



so the tangent vector at r = indeed converges in moments to the conjugate variable. 

Now the metric at r = becomes 

^11 (0, n) = Jtv {p' {A)) Tr {p' {A)) dfioA^)- 

Proposition 2.12. 



Trip' (A)) Trip' (^)) dfioA^) ^ H^). 
Proof. We need a formula of |Joh98l Formula 2.18]: if G C^(]R) with ip' bounded below, 

(2.3) nin-1) j ^-^^^^^y^U2,„ (s, t) dtds - ra^ ^ p' (t) (t) (t) rft + n J (t) (t) rft = 0, 



Ml 
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where we took N, M = P = 2, and h = in the formula, and 

/n 
JJ (Aa - A;,)^ Yl ("""-P i^c)) rfAs . . . dXn, 

a<b c=l 

/n 
J]^ (Aa - Afc)^ Y\ 6xp (-np (Ac)) (iA2 . . . c/A„. 
a<fe c=l 

Notice that dfix{s) = lim„^oo ^i,n(s), where dfix is the spectral measure of X. Now recall that 
nj^' it)ui,n (t) dt = j TY{ip\A))diio,n{A)=n j Tr (y;(A)) Tr (p'(A)) rf/xo,n(A) 
by Proposition I2.11[ Plugging this into equation (12.31) . and setting = p', we get 

Tr {p\A)) Tr {p\A)) dfio^^) = n J p' (t) p' {t) Mi,„ (t) dt - {n - 1) J P^^^^U2,n (s, t) dtds 

p' (t) p' (t) (t) dt+{n- 1) Q p' (t) p' (t) Ui,n (t) dt- j ^-^^^3^^M2,n (s, t) dtds^ . 

If we can show that 

(2.4) " ^) (/ P' P' " / ^^^7rf^«2,n (S, t) dtds^ ^ 

then we would have 

- y Tr (p'(A)) Tr {p\A)) rf/io,n(A) = ^ (t) p' (t) (t) rft ^ 



(p'(t))^t//ix(t)= y (9i(l®l)(t))^^^/ix(t) = $(X) 
and we would be done. 

We use another fact from |Joh98l Prop. 2.6] that for any ip G C(M^), 

lim / (p{s,t)u2nis,t) dtds = / (p{s,t)dfix (s) dfix (t) . 

n^oo J ■ J 

Thus, we have 



p'{s)p'{s)dfix{s) = lim / p' (t) p' {t) ui^n {t) dt. 
We add and subtract this limit to equation (12. 4p to get 

(2.5) (n- 1) p' (t) p' (t) (t) dt - \im^ j p' {t) p' {t) Ui,n it) d?j - 

(2.6) {n-l)( [ tM^LP^U2 n (s, t) dtds - lim / ^li^hl^lM^^ n (s, t) t^trfs^ . 

\J t — S ' n^oo J t — S ' J 

Now for (12. Sp . notice that 



y p' {t)p' (t) Mi,„ (t) = i y Tr (p'(A)2) t//io,„(A) = ^ 



i/(s,n), 

s=0 
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with z/(s, n) = log / exp [—riTr {p{A) + s ■ (p'(y4))^)) dA. Using the expansion of |EM03| we get 



i/(s, n) = eo{s) + ^ei(s) + . . . 



with ei{s) analytic. Therefore, 

lim / p' (t) p' (t) Ml „ (t) dt = lim — 

n— >oo / ' n^oo OS 



ipis,n) = eo(s), 



s=0 



and 



231) = n 



d_ 

ds 



^[s, n) — 



s=0 



9_ 

9s 



s=0 



eo(s) = n 



l_ d_ 

71^ ds 



ei s + 



s=0 



^2 S + 



s=0 



0. 



For ( 12.6p . we use |Eyn97 formula 2.5] 



M2,n(s,t) 



(2.7) 



n 



I (^E^K^)'exp(-np(s))) ( i2p,(t)2exp(-np(t)) 



n 



n — 1 



Oir. 



1 P„(s)P„_i(t)-P„_i(s)P„(t) 



s-t 



exp ( -- {,p{t)+p{s)) 



where Pq, . . . ,Pn are the monic orthogonal polynomials with respect to the measure exp {—np{t)) dt, 
and an are constants converging to a constant a which only depends on the support of the limit 
distribution. |Eyn97 , after formula 2.5] gives the expansion 



PJs] 



^L= cos «(s) + g{s)) ■ exp (^p(s)) , 



1 fn — I 

Pn-i{s) = cos «(s) + ip{s) + g{s)) ■ exp \^-^p{s] 

where /(s), ^(s), (/'(s), and ^'(s) are functions of s. We are only concerned with the order n 
expansion, so we do not need all these functions explicitly, but Eynard notes (in formula (2.11)) 
that /(s) = ■\/(s — a) (6 — s). Recognizing the first summand in 02. 7p as ■ 'Ui,n(s) ■ Wi,n(^) and 
rewriting the second summand using the expansion, we have 



Ti Tl I \ 
M2,n(s, t) = ■ Mi,„(s) ■ Mi,„(t) r I « 



n — 1 



n — 1 



X 



cos «(s) + g{s)) cos «(t) + ip{t) + g{t)) - cos «(t) + ^(t)) cos «(s) + ip{s) + 5f(s)) 

s-t 

Now —1 < cos (■) < 1, so for s,t satisfying |s — 1| > n~^/^, |s — a| > n~^^^ and |s — 6| > n~^^^, 
the absolute value of the second summand is bounded by 

n - 1 



INFORMATION GEOMETRY OF RANDOM MATRIX MODELS 20 

The first summand converges to dfix{s)d^x(t), so we see that the limit of M2,n(s, t) is dfix{s)dfix(t) 
a.e. dsdt. In equation (I2.6P we have subtracted limu2,n(s, t), so we have 

f p'(t) - p'(s) f n \ 
(l2?6ll ={n - 1) / ■ -uiJs)uiJt)dsdt - lim uiJs)uiJt)dsdt + 

J t — S \n — 1 n^co J 

. [ P'it)-p'{s) n f 1 1 \\, 

cos «(g) + x(-s)) cos + (y9(t) + xif)) - COS + xit)) COS «(■§) + v?(s) + x(g)) 

In our calculation that (12.51) ^ 0, we showed that n — limwi „) — > 0, so 

n (Mi,n(s)Mi,n(t) " lim 0. 
Thus, the first summand in (12.60 — 0. 

For s, t satisfying |s — t| > n^^/^, |s — a| > n^^/^ and |s — 6| > n^^^^^ 

nil 1 \^ 

a„ ^— . ^ X 



n 



cos «(s) + cos «(t) + ipit) + - cos «(t) + cos + V2(s) + 



s-t 

n 



< 



n — 1 

Therefore, (12. 6p — 0. So (12. 4p 0, and we have proved the proposition. □ 



Remark 2.13. By |Voi98l Prop 3.6], given Xi,...,Xfc freely independent, the conjugate variable 
d*x.{l®l) computed in larger algebra (M(Xi, . . . ,Xk),T) satisfies (9^.(1®1) G L^(]R(Xj), r|K(Xi))- 

Thus, 9^.(1 ® 1) = I / -^Y^z^- Thus, given Xi, . . . ,Xfc freely independent, with Xj the limit of 
the random matrix model exp {—riTr {pi{A))), we consider the following independent multi-matrix 
model (discussed in Section l3Ti ) 

exp (^-n Tr p,{A,) + I] " P'j (^^O + (i^, ^) j j • 

Since the random matrices Ai, . . . , are independent, they are asymptotically free, and since 
Ai is distributed according to exp {—nTr {pi{A))), the joint distribution of {Ai, ...,74^) converges 
to the joint distribution of (Xi, . . . ,Xfc). Also, the tangent vector p'j{Aj) converge to p'j{Xj) which 

isa^^,(i®i). 

According to the discussion in section 13.11 the off-diagonal terms in the metric vanish and we 
have 

9n (0,^) = y Tr (p; {A,)) Tr [p'^ (A,)) d^^t^,n{A,). 
Applying Proposition 12. 121 we get 

Tr (p;. (A,)) Tr (p^ (A,)) rf/xt,,„(A,) ^ $ (X,) . 
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Therefore, 

/ 

9ij (0, n) — > I 

\ $(Xfe) 



3. Cramer-Rao Theorem 

3.1. Independent Observations. 

In section ( 13. 2^ we will prove the Cramer-Rao theorem, which requires us first to make sense of 
independent observations and efficient estimators. 

Classically, given a random variable X whose distribution belongs to a model 
S = {exp {pe (x)) dx\9 e C M"^ }, a common problem is to estimate the value of 9 based on several 
independent observations of X. 

Recall that a random variable is a real Borel function on some probability space, so an observation 
of X is simply a real number, and k observations of X together form a vector in M.^. The requirement 
that the observations are independent, means that {xi, . . . ,Xk) is an observation of the random 
variable X'^'^ (shorthand notation for ((X®!®. . .(8)1), . . . , (1®. . .^l^X)), which is k independent 
copies of X). Note that the distribution of X'^'^ belongs to the model S®^, and the metric on S®'' 
was calculated in Section [TTSl to be the direct sum of the metrics on each copy of S. 

An estimator is a collection of functions : M'^ — M™, k = 1,2, . . used to estimated the value 
of ^ e MJ^ based on k independent observations. 

An unbiased estimator is an estimator such that if X is distributed according to exp {pg (x)) dx, 
then the average of taken over all k independent observations of X is equal to 9. In concrete 
terms, independent observations of X means that 9 (1) = ... = 9{r), so the requirement for an 
unbiased estimator becomes 



i{xr) I dxi . . . dxk = 9. 



Given an unbiased estimator, and k independent observations (xi, . . . ,Xk), the error of the esti- 
mate is 

ee,fc {xi, . . .,Xk) = ik {xi, . . .,Xk) -9. 
We may calculate the covariance matrix for the entries of the error 

(Cov(ee,fc))ij- = j {ee,k {xi, ■ ■ • ,a;fc)). {ce^k {xi, ■ ■ . ,a;fc))^.exp ^J^Pe (^'-) j ^^i • --d-Xk. 

The classical Cramer-Rao Theorem |ANOO| gives a lower bound on this covariance: 
Theorem 3.1. Given a model S and an unbiased estimator ^k, 

Cov{ee,k)>g-'{9) 

where g is the Fisher Information metric on S®^, and > is in the sense of positive semi-definite 
matrices. 



In the random matrix case, we must first make sense of independent observations. Recall that a 
random matrix A is a matrix-valued random variable, so 
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Definition 3.2. An observation of A is simply a matrix a G M^^{C), and k observations of A 
together form a vector (oi, . . . , a^) G (M;f^(C)) . 

Definition 3.3. oi, . . . , are k independent observations of A if (ai, . . . , a^) is an observation of 
j^<»k independent copies of A). 



If the distribution of A belongs to the model S = {exp {—riTr {pn,e{A))) dA}^ then the distribution 
of A®*^ belongs to the model S®'', which was discussed in section ( ll.Sp . We denote Pe,n (^i, • • • , ^fc) 
for pe^n (Ai) + ... +pe,n (Ak), and dfle^n{Ai, • • • , A) for • ■ ■ (i/ie,n(^fc)- 

The Cramer-Rao theorem also requires us to define unbiased estimators. Recall from our previous 
discussion that an estimator is a function on several independent copies of the random variable, 
such that its expectation gives the estimated parameter value. In addition, the proof of the classical 
Cramer-Rao rests on the fact that an estimator may be viewed as a member of the tangent space 
of the model. With these requirements in mind, we define 

Definition 3.4. An estimator is a collection of functions {^k)i £ . . . ,Xk), where k specifies 

the number of observations and i specifies the parameter to be estimated. Given observations 
Ai,..., Ak, ^Tr (^fc (^1, ■ ■ • , -4fc) .) is an estimate of 9i. 

Definition 3.5. An unbiased estimator is an estimator such that 

- / Tr (^fc (Ai, . . . , Ak),) dfie.n{Ai, ...,Ak) = 9,. 

n J I . 

For an unbiased estimator and k independent observations (Ai, . . . , Ak), we have the error of the 
i*'^ estimate 

{ek,e (Ai, . . . , Ak)), = (a (Ai, . . . , Ak) - 9), , 
and we may compute the covariance matrix of the entries of the error using our inner-product 

Gov {ek,e) = " / Tr {ek,g (Ai, . . . , Ak),) Tr (^ek,e (A,,..., A^) .) dfleA^u • • • , ^fc). 
We will bound this quantity in the Cramer-Rao theorem. 



3.2. Cramer-Rao Theorem. 

In this section we prove the Cramer-Rao theorem for the random matrix model. First we need a 
proposition: 

Proposition 3.6. Given f G M(xi, . . . ,Xk) a symmetric function, let H : S®^ — > M 6e given by 
9^ Tr{f{Ai,...,Ak))dfie,n{^i,...,Ak). Then 

{f + H{9),f + H{9)),>{dH,dH),, 
with equality if and only if f & TqS®^ . 
Proof. First, consider the enlarged model 

{/ / ^ km 

exp {-nTi\y^p {A,) + 9.,F, (r) (A,) + tf {A^, . . . , Ak) + ^ {9, t, n) 

\ \r=l r=l i=l 

with ij{9, t,n) = ^ log / exp (-riTr (^ti Pi^r) + Eti ET=i ^^^^ (0 i^r) + tf{A,, . . . , Ak))) dA, 
We have 

{dH,dH)\f)^g(sk+ < {dH,dH)\^Q f^^^g , 
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with equality if and only if S* = S®^. Let 6 = {6, t), and let Pg^ {Ai, . . . , Ak) denote the potential 
in S. For any X G T(^0fl)S, we can differentiate H with respect to X to get 

i=l \ / (6»,0) ddi .^^ \ dOi I (£1^0) \ <^Qi I (6»,0) 

=f:(^.-^-) (^■^+^('')) =(/+^f('').-^v,o). 

\ I (61,0) \ <J^i I (e,o) 

where the next to last equality is due to the fact that 

' dP- \ d C 

H{e)^ = H{e)^ j exp {-uTr (P,-_„ (Ai, . . . , A,))) dA = 0. 

The equation X ■ H{6) = (/ + H{6),X)g is the definition of grad (H) in T(5i o)5', and combined 
with the fact that f + H{6) G T(^efl)S it shows that 

/ + =grad(/7) in r(e,o)^. 

We have 

{dH,dH)\g^gg,, < {dH,dH)\^g^^^^g = {grad (H) , grad {H))]^^^^^^^ = 

(/ + H{e), f + i/(^^))|(,,o)e5 = (/ + H{e), f + H{e))\,^, , 

with equality if and only if S* = S®''. 

□ 



Now we prove the Cramer-Rao theorem: 
Theorem 3.7. Given an exponential family S, and an unbiased estimator on S, it satisfies 

in the sense of positive semi-defi,nite matrices, where {■,-)g is the inner-product on S"®^, and G is 
the Fisher Information metric on S®''. 

Proof. To show that (^(^{^k — d)i , {(.k — j — 9 is positive semi-definite, fix an arbitrary vector 
c G M™, and let ^ 

m 

/ (Ai, ...,Ak) = ^Ci (a (Ai, . . . , Ak)), . 

i=l 

Write 5®^ = {exp {-nTr {Er=iPe,n (A,))) }, Pe,n (^i, • • • , A^) for Zr=i Pe,n (^), djleA^i, ■ ■ ■ , Aj,) 

iord^g^Ai) ■ ■ ■d^.e,n{Ak), and denote the expectation by {f) = ^J Tr (/ {Ai, . . .,Ak)) dfig^Ai, . . . 
This model was discussed in Section [T75l denote its metric by Gij {9,n). Define H{9) = —Kg (/) as 
in Proposition I3.6[ 

Because ^k is unbiased, H{9) = — Yl'i=i ^i^i^ ■^H{9) = —Ci, and Proposition 13.61 shows that 

{dH,dH),<{f + H{9)J + Hi9)),. 



INFORMATION GEOMETRY OF RANDOM MATRIX MODELS 24 

The left-hand side is 

\ i=l * * j=l ^ 

m / m \ f) / 

E ^ E ^r^e ((e.).) E ^^^^ ((^^ (^1' ■ ■ ■ ' ^^)).) ) n)) . . = 

j,j=l * \r=l / ^ \s=l 

I m \ rs / m 



m 

and the right-hand side is 

m 

if + H(e), f + H(e)), = Yl ^^^j ((^^ - ' (^^ - 

Therefore, we have shown that 

c' {{ik - e\ , ii, - 9)^)^ c > c* {C' {9, n))^^ c, 
with equality if and only if (^^ - 9)- G TqS®'' for i = 1, . . . , m. 



□ 



Next we prove the converse to the Cramer-Rao theorem, which requires a definition: 

Definition 3.8. An efficient estimator is an unbiased estimator $,k that attains the bound in the 
Cramer-Rao theorem. 



Theorem 3.9. A random matrix model S — {exp {—nTr {Qe,n{-^)))}, with Qo G M(x), has efficient 
estimators 9i, . . . ,9h if and only if it is an exponential family, i.e. 

Qe {A) =p{A)+Y^ 0^F^ {A) + ^ {9, n) . 

Proof. (<^=) Suppose S is an exponential family. Recall that we defined the dual coordinate system 
r]i = ■^'ip{9,n). Thus, we may write the tangent vectors as — rji, and the Fisher information 
metric as 

9ij n) = {{Fi - r}i) , {Fj - rjj))^ . 

This equation shows that 

1 ^ 

f]i{Ai,...,Ak) = ^E^^(^^) 

i=i 

is an efficient estimator for rji. Thus, we have found a coordinate system r]i, . . . ,rjkior the exponential 
family which has efficient estimators 171, . . . , 
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(^) Given S with efficient estimators 6i {Ai, . . . , Ak) , ... ,9m (^i, • • • , Ak), fix /c = 1. Since tlie 
estimators are efficient, they attain equaUty in the Cramer-Rao theorem: 



9{A)-e] Ad{A)-e 



and this imphes that 6i {A) — 9i E TgS by the equahty in Proposition I3.6[ So we have m linearly 

independent vector fields on S. To see that they are parallel with respect to the (l)-connection, fix 
9' and consider the model 



S' = jexp l^-nTv \^Qe> (A) + C^F^ {A) + (C, n) 
with (C, n) = ^ log / exp (-nTr {Qg, {A) + YZi CiFi {A))) dA. Denote 

m 

pU^) = QeiA) + J2 C^F^ {A) + (C, n) . 

i=l 

This is a submanifold of the original model, and since Fi,...,^^ are linearly independent 

Tc=qS' = Tg=g>S. 



Now 



J Tr(F,(A))exp(-nTr(pe',n (A))) dA 



-- / Tr ( 9i{A) - 9i ) exp (-nTr (pe',n(A))) dA = 0, 



because 9 is an unbiased estimator. Thus, 



dpc,n 



C=o 



exp (-nTr {A)))dA 



C=o> 



1 



n 



Tr (F, {A)) exp (-nTr (pe> {A))) dA = Q 



again because 9 is an unbiased estimator. Now we calculate the (l)-connection for the C, coordinate 
system at C = 0: 



ijk 



(0) 



Q^^PCnj Tr y-Q^Pc,n j exp (-nTr {pg>^n (A))) dA 

j Tr (^Pc,n) exp (-nTr {pe>,n {A))) dA = 0. 
Therefore, C is a (l)-fiat coordinate system for S at 9' for 9' G S, so we may write 

Qe,n (A) = Qg',n {A) + Y,Ci (A) -9?j+<P (C, n) . 



dCkdCi 



□ 
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3.3. Relation to the Free Cramer-Rao. 

In this section we compare the Cramer-Rao Theorem 13.71 as n — >^ cxo to Voiculescu's Free Cramer- 
Rao Theorems. We recall the free Cramer- Rao theorem in the one- variable case |Voi98| : 

Theorem 3.10. Given v > Q, v e L^(R) n -L^(M), / v{x)dx = 1, and J x'^v{x)dx < oo, let 
Xq = J xv{x)dx, then 

v{x)'^dx] ( [ {x — Xq)"^ v{x)dx] > 



47r2' 



V 



In this theorem f{x) = x plays the role of an estimator. To put this in the conext of the random 
matrix model, consider a random matrix A ~ exp {—nTr {p{A))) dA, with limit distribution v{x). 
In section (12.21) we showed that Voiculescu's $ corresponds to embedding A in the model 

exp (-nTr {p{A) + tp'{A) + ^'(t, n))) , 

and considering the Fisher information metric at t = 0. The estimator is / (A) = A. To apply our 
Cramer-Rao theorem, we must assume that / is an unbiased estimator at t = 0: 

- [ Tr(/(A))exp(-nTr(p(A)))rfA = 0. 
n J 

Our Cramer-Rao theorem says 

{f{A),f{A)),^,>G'\0,n), 

where G(t,n) is the metric on our model. In section ( 12. 2p we showed that G{0,n) — > ^{v{x)dx). 
Thus, we have 

lim ( - / Ti{A)Tr{A)exp{-nTi{p{A)))dA] > (^-\v{x)dx). 

n^oo \ J J 

Now, since ^Tr {{A - Tr {A)f) = ^Tr {A'^)-^Tv (A) Tr (v4)+Tr (A) Tr (A), and ^Tr {{A - Tr {A)f) > 
0, and ^Tr (A) Tr (A) > 0, we have 

-Tr (A) Tr (A) < -Tr (A^) . 

n ' 

Thus, our Cramer-Rao implies that 

lim - / Tr {J^\ exp (-nTr ipiA))) dA > <i>'^(v(x)d. 

n^oo n J 

which is precisely 

x^v{x)dx > I {v{x))'^ dx. 



X] 



The several variable case result |Voi98| states that 

Theorem 3.11. Given Xi, . . . , G {A, r) with {A, r) a tracial unital von Neumann algebra, 

t{XI + ... + Xl) > fc2 . $-1 (Xi, . . . , Xfc) . 

To compare it to our theorem, we must assume that (Xi, . . . ,Xk) are freely independent, and 
that we have random matrices Ai, . . . , A^, with A^ converging to X,., and A^ distributed according 
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to Z~lexp (— nTr (p (r) (A))). Following section (II .Sp . exp ^— nTr ^^^=i^' l'^) (^f-))) converges to 
(Xi, . . . , Xfc). We embed this in the model 



5 = <^ exp -riTv J]p(r) (A,) +t (r) (A,) + ^/^ (t, 

t \ \r=l r=l 

and the discussion in section (12.21) shows that 

k 

= V lim Grr{0,n). 



n— >oo 



The estimator in Voiculescu's theorem is / (Ai, . . . , A^) = ■|X]r=i^''' ^'^^ must assume it is 
unbiased at t = 0: 



Now our Cramer-Rao theorem says 

k \ /, fc 



j^'[ltl ^ J Tr ^ Asj exp |^-nTr p (r) (A,) j j rfyli . . . dA, > ^-^(0, 



n) 



Since is independent of A^ for I s, this is equivalent to 



^ 5^ y Tr (A,) Tr (A,) exp ^-nTr (^^Pir) (AO j j • • • dA^ > G-\0, n). 



k 

This is equivalent to 



lim Ylj^^ (A) Tr (A,) exp ^-n Tr (r) (A^) j j 

Again, since — Tr (Ag) Tr (Ag) < ^ Tr {A^) for s = 1, . . . , A;, we obtain 



dAi...dAk>e (Xi,...,Xfc). 



lim Tr(A2)exp ^-nTr ^^p(r) (v4^)j j 



which is 

r(x2+,...,x2) > A;2.$-i (Xi,...,Xfc). 



3.4. Relation to Second-Order Freeness. 

In this section we show that the quantities which motivated Speicher's theory of second-order 
freeness are naturally related to the geometry of the random matrix model. 

Given a random matrix A ~ exp {—n Tr {p{A) + ijj{n))), and a collection of functions Fi, . . . , G 
|MS06| provide several definitions for the fluctuations of Fi{A), . . . ,Fm{A), and we use one 
which is convenient for us: 
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Definition 3.12. The fluctuations of Fi (A) , . . . , Fm (A) are (if they exist) 

/3ij = lim / Tr - ai{n)) Tr iFj{A) - aj{n)) exp (-nTr {p{A) + ipin))) dA, 

with 
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a,{n) = ^ Jty {Fi{A)) exp (-nTr {p{A) + i^in))) dA. 



MS06| have developed a general framework of conditions on an algebra so that the fluctuations 



of any of its elements may be calculated; they call it Second-Order Freeness. 
Consider the random matrix model 

exp ^-nTr ^{A) + ^ 9iFi{A) +^{9,'. 

For small enough 9, this model converges, and in particular we have at 6* = 0: 



n 



9«(0.„) = /Tr(f.+ |-[^^^.)Tr(f, + |- 



61=0 



ip exp (-nTr {p{A) + ip{0, n))) dA. 



Since 



we have 



^{9,n) 



-- / Tr (Fi) exp (-nTr (p(A) + V^(0, n))) dA = -aAn), 
n J 



Qij (0, n) = y" Tr {Fi - ai{n)) Tr {F^ - aj{n)) exp (-nTr {p{A) + ^(0, n))) dA. 

In Section [2] we showed that this metric converges as n, ^ oo, so we have 

c/i,(0) = lim / Tr {Fi - ai{n)) Tr {F. - a An)) exp (-nTr {p{A) + ^^(0, n))) dA. 

n—*oo J 

These are precisely the fluctuations of Fi{A), . . . , Fm{A). 

Remark 3.13. As a result, we have shown that the fluctuations of a random matrix may be considered 
as tangent vectors of the random matrix model obtained by perturbing the potential with the 
fluctuation functions; the inner-product of the fluctuations is the metric on this model; and as a 
result, the fluctuations of a random matrix give rise to a positive-deflnite form. 
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